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Receptor recognition by viruses is the first and essential step of viral infections of host cells. It is an important determinant of 
viral host range and cross-species infection and a primary target for antiviral intervention. Coronaviruses recognize a variety of 
host receptors, infect many hosts, and are health threats to humans and animals. The receptor-binding S1 subunit of coronavi- 
rus spike proteins contains two distinctive domains, the N-terminal domain (S1-NTD) and the C-terminal domain (S1-CTD), 
both of which can function as receptor-binding domains (RBDs). S$1-NTDs and S1-CTDs from three major coronavirus genera 
recognize at least four protein receptors and three sugar receptors and demonstrate a complex receptor recognition pattern. For 


example, highly similar coronavirus S1-CTDs within the same genus can recognize different receptors, whereas very different 
coronavirus S1-CTDs from different genera can recognize the same receptor. Moreover, coronavirus $1-NTDs can recognize 
either protein or sugar receptors. Structural studies in the past decade have elucidated many of the puzzles associated with coro- 
navirus-receptor interactions. This article reviews the latest knowledge on the receptor recognition mechanisms of coronavi- 
ruses and discusses how coronaviruses have evolved their complex receptor recognition pattern. It also summarizes important 


principles that govern receptor recognition by viruses in general. 


Ce (CoV) are a group of common, ancient, and di- 
verse viruses. They infect many mammalian and avian species 
and cause respiratory, gastrointestinal, and central nervous system 
diseases (1, 2). Coronavirus virions contain an envelope, a helical 
capsid, and a single-stranded and positive-sense RNA genome. The 
length of their genomes, which are the largest among all RNA viruses, 
typically ranges between 27 and 32 kb. They were named “coronavi- 
ruses” because of the protruding spike proteins on their envelope that 
give the virions a crown-like shape (“corona” in Latin means crown). 
Coronaviruses belong to the Coronaviridae family in the order of 
Nidovirales. They can be classified into at least three major genera, 
a, B, and y (formerly group 1, 2, and 3, respectively) (3). Proto- 
typic a-genus coronaviruses include human coronavirus NL63 
(HCoV-NL63), porcine transmissible gastroenteritis coronavirus 
(TGEV), and porcine respiratory coronavirus (PRCV). Prototypic 
B-genus coronaviruses include severe acute respiratory syndrome 
coronavirus (SARS-CoV), Middle East respiratory syndrome 
coronavirus (MERS-CoV), mouse hepatitis coronavirus (MHV), 
and bovine coronavirus (BCoV). Prototypic y-genus coronavi- 
ruses include avian infectious bronchitis virus (IBV). These three 
major coronavirus genera and their prototypic coronaviruses are 
the focus of this review article (Fig. 1). 

Coronaviruses impose health threats to humans and animals. 
Two B-coronaviruses, SARS-CoV and MERS-CoV, are highly patho- 
genic human pathogens. SARS-CoV caused the SARS epidemic in 
2002 to 2003, with over 8,000 infections and a fatality rate of ~10% 
(4-7). MERS-CoV emerged from the Middle East in 2012. As of 16 
October 2014, MERS-CoV had caused 877 infections with a fatality 
rate of ~36% (http://www.who.int/csr/don/16-october-2014-mers 
/en/) (8, 9). HCoV-NL63 from the a-genus is a prevalent human 
respiratory pathogen that is often associated with common colds in 
healthy adults and acute respiratory diseases in young children (10, 
11). Among the animal coronaviruses, TGEV from the a-genus and 
MHV from the B-genus cause close to 100% fatality in young pigs 
and young mice, respectively (12-15); BCoV from the B-genus and 
IBV from the y-genus also cause significant health damage in cattle 
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and chickens, respectively (16-19). Therefore, research on coronavi- 
ruses has strong health and economic implications. 

Receptor recognition by viruses is the first and essential step of 
viral infections of host cells (20). An envelope-anchored spike protein 
mediates coronavirus entry into host cells by first binding to a recep- 
tor on the host cell surface and then fusing viral and host membranes 
(21, 22). A member of the class I viral membrane fusion proteins 
(23-26), the coronavirus spike consists of three segments—an ect- 
odomain, a single-pass transmembrane anchor, and a short intracel- 
lular tail (27, 28). The ectodomain can be divided into a receptor- 
binding S1 subunit and a membrane-fusion S2 subunit. The amino 
acid sequences of S1 diverge across different genera but are relatively 
conserved within each genus (29). S1 contains two independent do- 
mains, an N-terminal domain (S1-NTD) and a C-terminal domain 
(S1-CTD, also called S1 C-domain) (Fig. 1) (29). Either or both of 
these S1 domains can function as a receptor-binding domain (RBD). 
The binding interaction between coronavirus RBD and its receptor is 
one of the most important determinants of the coronavirus host 
range and cross-species infection (2, 30). In addition, coronavirus 
RBDs contain major neutralization epitopes, induce most of the host 
immune responses, and may serve as subunit vaccines against coro- 
navirus infections (31-36). Knowledge about the receptor recogni- 
tion mechanisms of coronaviruses is critical for understanding coro- 
navirus pathogenesis and epidemics and for human intervention in 
coronavirus infections. 

Coronaviruses recognize a variety of host receptors (Fig. 1). 
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FIG 1 Receptor recognition pattern of coronaviruses. 


Although HCoV-NL63 and SARS-CoV belong to the a-genus and 
B-genus, respectively, their S1-CTDs recognize the same receptor, 
angiotensin-converting enzyme 2 (ACE2) (37—43). Although 
HCoV-NL63, TGEV, and PRCV all belong to the a-genus, their 
S1-CTDs recognize different receptorr—TGEV and PRCV S1- 
CTDs both recognize aminopeptidase N (APN) (44, 45). Simi- 
larly, although SARS-CoV and MERS-CoV both belong to the 
B-genus, their S1-CTDs recognize different receptors—MERS- 
CoV S1-CTD recognizes dipeptidyl peptidase 4 (DPP4) (46-48). 
Although MHV and BCoV both belong to the B-genus, their S1- 
NTDs recognize carcinoembryonic antigen-related cell adhesion 
molecule 1 (CEACAM1) and sugar, respectively (49-53). In addi- 
tion, the S1-NTDs of a-genus TGEV and y-genus IBV also recog- 
nize sugar (52, 54-58). Overall, coronaviruses have evolved a 
complex receptor recognition pattern: (i) coronaviruses use one 
or both S1 domains as RBDs; (ii) highly similar coronavirus S1- 
CTDs within the same genus can recognize different protein re- 
ceptors, whereas very different coronavirus S1-CTDs from differ- 
ent genera can recognize the same protein receptor; and (iii) 
coronavirus S1-NTDs can recognize either protein or sugar recep- 
tors. Understanding the receptor recognition mechanisms of 
coronaviruses can provide critical insight into the origin, evolu- 
tion, and receptor selection of coronaviruses. 

In addition to their viral receptor functions, the receptors for 
coronaviruses have their own physiological functions. ACE2 is a 
zinc-dependent carboxypeptidase that cleaves one residue from 
the C terminus of angiotensin peptides and functions in blood 
pressure regulation (59-62). ACE2 also protects against severe 
acute lung failure, and SARS-CoV-induced downregulation of 
ACE2 promotes lung injury (63, 64). APN is a zinc-dependent 
aminopeptidase that cleaves one residue from the N terminus of 
many physiological peptides and plays multifunctional roles such 
as in pain regulation, blood pressure regulation, and tumor cell 
angiogenesis (65, 66). DPP4 is a serine exoprotease that cleaves 
two residues from the N terminus of many physiological peptides 
and functions in immune regulation, signal transduction, and 
apoptosis (67-70). CEACAM1 is a cell adhesion molecule and 
functions in cell-cell adhesion (71-73). Sugars decorate many 
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proteins and fats on cell surfaces and function in many biological 
processes such as immunity and cell-cell communication (57, 74, 
75). How these cell-surface molecules are selected by viruses as 
their entry receptors has been a major puzzle in virology. 

Analyses of crystal structures of coronavirus S1 domains and 
their complexes with their respective receptor have elucidated 
many puzzles associated with coronavirus-receptor interactions. 
Since the SARS epidemic, the crystal structures of five coronavirus 
S1 domains complexed with their respective receptor have been 
determined. These are the B-genus SARS-CoV S1-CTD com- 
plexed with human ACE2 (76), B-genus MERS-CoV S1-CTD 
complexed with human DPP4 (77, 78), a-genus HCoV-NL63 S1- 
CTD complexed with human ACE2 (79), w-genus PRCV S1-CTD 
complexed with porcine APN (80), and B-genus MHV S1-NTD 
complexed with murine CEACAM1 (81). In addition, the crystal 
structure of B-genus BCoV S1-NTD by itself has been determined, 
with its sugar-binding site identified through mutagenesis (53). 
These six representative structures not only reveal how coronavi- 
ruses recognize their receptors in atomic details but also shed light 
on how coronaviruses do so using complicated evolutionary strat- 
egies. Other than these six representative structures, several vari- 
ant forms of these structures have also been determined, including 
S1-CTDs of different SARS-CoV strains complexed with ACE2 
from animals and S1-CTD of a MERS-CoV-related bat coronavi- 
rus HKU4 complexed with human DPP4 (82-84). This article 
reviews these structural studies and their implications for the re- 
ceptor recognition and evolution of coronaviruses. 


$1-CTDs OF B-GENUS CORONAVIRUSES 


B-genus SARS-CoV S1-CTD complexed with human ACE2 was 
the first crystal structure determined for a coronavirus S1 domain 
and S1 domain/receptor complex (Fig. 2A) (76, 85). SARS-CoV 
S1-CTD contains two subdomains: a core structure and an ex- 
tended loop. The core structure consists of a five-stranded antipa- 
rallel B-sheet and several short connecting a-helices. The ex- 
tended loop lies on one edge of the core structure and forms a 
gently concave surface with two ridges on both sides and a two- 
stranded antiparallel B-sheet sitting in the middle (Fig. 3A and B). 
Because this extended loop makes all the contacts with ACE2, it 
has been termed receptor-binding motif (RBM). On the other 
hand, the peptidase domain of ACE2 has a claw-like structure with 
two lobes. The enzymatic active site of ACE2 is buried in a cavity 
surrounded by the two lobes. SARS-CoV S1-CTD binds to the 
outer surface of the N-terminal lobe, away from the peptidase 
active site. Consequently, SARS-CoV binding has no effect on the 
enzymatic activity of ACE2 and vice versa. The SARS-CoV-bind- 
ing region on the ACE2 surface has been termed virus-binding 
motif (VBM). The RBM and VBM complement each other in 
shape and chemical details. The structure of SARS-CoV S1-CTD/ 
ACE2 complex provided the first view of coronavirus S1 and S1/ 
receptor complex and laid the foundation for future structural 
and evolutionary comparisons with other coronavirus S1 and S1/ 
receptor complexes. 

Comparative studies of the interactions between the $1-CTD 
from different SARS-CoV strains and ACE2 from different host 
species have elucidated the molecular and structural mechanisms 
by which SARS-CoV transmitted from animals to humans and 
caused the SARS epidemic (30, 83, 84, 86-89). Two virus-binding 
hot spots have been identified in the VBM of ACE2, one centering 
on ACE2 residue Lys31 and the other centering on ACE2 residue 
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FIG 2 Crystal structures of coronavirus $1-CTDs complexed with their respective receptor. (A to D) These structures include B-genus SARS-CoV $1-CTD 
complexed with ACE2 (Protein Data Bank identifier [PDB ID]: 2AJF) (76) (A), B-genus MERS-CoV S1-CTD complexed with DPP4 (PDB ID: 4KRO) (77) (B), 
a-genus HCoV-NL63 S1-CTD complexed with ACE2 (PDB ID: 3KBH) (79) (C), and a-genus PRCV S1-CTD complexed with APN (PDB ID: 4F5C) (80) (D). 
In the structures of the complexes, the receptors are in green, and the cores and RBMs of $1-CTDs are in cyan and red, respectively. (E and F) The structural 
topologies of the four coronavirus $1-CTDs are shown as schematic illustrations, where B-strands are depicted as arrows and a-helices as cylinders. In the tertiary 
structures and structural topologies of the $1-CTDs, the secondary structures of all of the S1-CTDs are colored and numbered in the same way as for HCoV-NL63 


$1-CTD. 


Lys353 (Fig. 3C and D). Both of these virus-binding hot spots 
consist of a salt bridge that is buried in a hydrophobic environ- 
ment. Structure-guided functional studies revealed that both vi- 
rus-binding hot spots provide significant energy to the virus-re- 
ceptor binding interactions (90). Indeed, all of the naturally 
selected viral mutations in SARS-CoV RBM surround the two hot 
spots, with significant impact on the structures of the hot spots, 
the ACE2 binding affinity, and the host immune responses (84, 
91). One of these viral mutations, K479N, facilitated transmission 
of SARS-CoV from palm civets to humans. Another viral muta- 
tion, $487T, facilitated transmission of SARS-CoV from human to 
human. These two mutations contributed significantly to the 
SARS epidemic in 2002 to 2003. The S1-CTD of a SARS-CoV- 
related Rs3367 bat coronavirus contains two asparagines at these 
two positions (corresponding to positions 479 and 487 in human 
SARS-CoV strains) (92). The first asparagine is favorable for hu- 
man ACE2 binding, and the second one is less favorable. Thus, 
Rs3367 recognizes human ACE2 but probably less well than the 
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human SARS-CoV strains do. For more details about how the 
structural analysis of SARS-CoV RBD/ACE2 interactions has pro- 
vided insight into the SARS epidemic, please refer to another re- 
cent review article on this topic (30). These structural studies of 
SARS-CoV S1-CTD/ACE2 interactions demonstrate that it is crit- 
ical to understand viral evolution, cross-species transmission, and 
epidemics within a detailed structural framework. 

The crystal structures of B-genus MERS-CoV S1-CTD by itself 
and in complex with human DPP4 provided another view of coro- 
navirus S1 and S1/receptor complex (Fig. 2B) (77, 78, 93). Like 
SARS-CoV S1-CTD, MERS-CoV S1-CTD also contains a core 
structure and an RBM. The core structures of MERS-CoV and 
SARS-CoV S1-CTDs are highly similar to each other, but their 
RBMs are markedly different, leading to different receptor speci- 
ficities. The RBM of MERS-CoV S1-CTD mainly consists of a 
four-stranded B-sheet, in contrast to the loop-dominated RBM in 
SARS-CoV S1-CTD. Like the VBM for SARS-CoV on ACE2, the 
VBM for MERS-CoV is also located on the outer surface of DPP4, 
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FIG 3 Virus-binding hot spots on ACE2 that are critical for the binding of SARS-CoV and HCoV-NL63. (A) Enlarged view of the SARS-CoV—ACE2 interface. 
VBMs on ACE2 and RBM on SARS-CoV S1-CTD are in blue and red, respectively. (B) Footprint of SARS-CoV on the surface of ACE2. The view is derived from 
the one in panel A by rotating ACE2 by 90° along a horizontal axis in such a way that the edge facing the viewer moves up. VBM1 residues are in orange, VBM2 
residues in magenta, VBM3 residues in red, and VBM 1b residues in green. (C) A virus-binding hot spot on ACE2 centering on Lys31 is critical for the binding 
of SARS-CoV S1-CTD. Mutation of residue 479 on SARS-CoV S1-CTD is critical for the transmission of SARS-CoV from palm civets to humans. (D) A second 
virus-binding hot spot on ACE2 centering on Lys353 is also critical for the binding of SARS-CoV S1-CTD. Mutation of residue 487 on SARS-CoV S1-CTD is 
critical for the transmission of SARS-CoV from human to human. (E) Enlarged view of the HCoV-NL63—ACE2 interface. (F) Footprint of HCoV-NL63 on the 


surface of ACE2. (G) The same virus-binding hot spot on ACE2 centering on Lys353 is also critical for the binding of HCoV-NL63 S1-CTD. 


away from the peptidase active site. Whereas the conserved core 
structures of SARS-CoV and MERS-CoV S1-CTDs suggest a com- 
mon evolutionary origin, the different RBMs of the two S1-CTDs 
indicate a divergent evolutionary pathway that has led to their 
recognition of different host receptors. The S1-CTDs of MERS- 
CoV anda highly related bat coronavirus HKU4 recognize DPP4 
in very similar ways, suggesting a close evolutionary relationship 
between the two viruses (82, 94). In addition to enhancing the 
understanding of coronavirus evolution, the structure of MERS- 
CoV S1-CTD/DPP4 complex has important implications for un- 
derstanding the host range and cross-species transmission of 
MERS-CoV (82, 94-97). 


$1-CTDs OF a-GENUS CORONAVIRUSES 

a-Genus HCoV-NL63 S1-CTD complexed with human ACE2 
was the first crystal structure determined for an a-coronavirus S1 
domain (Fig. 2C) (79). This structure, along with the structure of 
B-genus SARS-CoV S1-CTD complexed with ACE2, provided the 
first view of how two different viruses recognize their common 
host receptor. The finding was intriguing. At first glance, HCoV- 
NL63 and SARS-CoV S1-CTDs are very different. The core struc- 
ture of HCoV-NL63 S1-CTD is a B-sandwich consisting of two 
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B-sheet layers stacked together through hydrophobic interac- 
tions, which is in contrast to the single B-sheet layer in the core 
structure of SARS-CoV S1-CTD. Their RBMs are also different. 
The RBMs of HCoV-NL63 S1-CTD are three short and discontin- 
uous loops, whereas the RBM of SARS-CoV S1-CTD is a single 
long and continuous subdomain. Indeed, the protein-folding Dali 
server failed to detect any structural similarity between HCoV- 
NL63 and SARS-CoV S1-CTDs (98). However, structural topol- 
ogy analysis revealed that the secondary structural elements in 
HCoV-NL63 S1-CTD are connected in the same way as those in 
SARS-CoV S1-CTD, although two f-strands in the former 
(strands B-1 and B-4) become a-helices in the latter (helices a-1 
and a-4) and another B-strand (strand B-1) in the former is miss- 
ing altogether in the latter (Fig. 2E and F) (29). These results 
suggest that HCoV-NL63 and SARS-CoV S1-CTDs share an evo- 
lutionary origin and that the structural differences between the 
two S1-CTDs result from extensive divergent evolution. 

Despite their different tertiary structures, HCoV-NL63 and 
SARS-CoV S1-CTDs bind to a common region on ACE2 (79, 90). 
The VBMs for the two viruses on ACE2 overlap, and a number of 
ACE2 residues interact with both $1-CTDs (Fig. 3E and F). Sur- 
prisingly, one of the two virus-binding hot spots on ACE2 for 
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FIG 4 Proposed origin and evolution of coronavirus $1-CTDs. The question 
mark indicates possible tertiary structures of y-coronavirus $1-CTD. 


SARS-CoV binding, which centers on ACE2 residue Lys353, plays 
a similarly critical role in the binding of HCoV-NL63 (Fig. 3G). 
Disturbance of the hot spot structure via mutagenesis decreased or 
abolished the binding of both viruses. Hence, Lys353 and the 
nearby residues on ACE2 form a common virus-binding hot spot 
that is critical for the attachment of two different coronaviruses. 
On the other hand, among the three RBMs in HCoV-NL63 S1- 
CTD, only RBM1 and RBM2, but not RBM3, are involved in bind- 
ing the common virus-binding hot spot on ACE2, despite the fact 
that RBM3 is topologically equivalent to the RBM in SARS-CoV 
S1-CTD (Fig. 2A, C, E, and F). The different molecular mecha- 
nisms used by the two S1-CTDs to recognize ACE2 suggest a con- 
vergent evolutionary relationship between the two S1-CTDs (i.e., 
the two S1-CTDs evolved independently to recognize the same 
virus-binding hot spot on ACE2), although a divergent evolu- 
tional relationship cannot be completely ruled out (i.e., the two 
S1-CTDs both evolved from a common ancestral protein that 
bound ACE2). Therefore, after HCoV-NL63 and SARS-CoV S1- 
CTDs underwent divergent evolution to attain different struc- 
tures, they might have further converged to recognize the same 
region on the same receptor. The common virus-binding hot spot 
on ACE2 might be the driving force for this later convergent evo- 
lution. 

The crystal structure of a-genus PRCV S1-CTD complexed 
with porcine APN illustrated how another similar a-coronavirus 
S1-CTD recognizes a different host receptor (Fig. 2D) (80). Sim- 
ilarly to the structural relationship between SARS-CoV and 
MERS-CoV S1-CTDs, PRCV and HCoV-NL63 S1-CTDs also 
have highly similar core structures. However, their three RBMs are 
divergent, leading to different receptor specificities. Similarly to 
the VBMs on ACE2 and DPP4, the VBMs for PRCV on APN are 
also located on the outer surface of APN, away from the peptidase 
active site. Overall, these results suggest that PRCV and HCoV- 
NL63 S1-CTDs share an evolutionary origin but have diverged in 
their RBM loops to recognize different host receptors. 

We propose the following evolutionary scenario for coronavi- 
rus $1-CTDs (Fig. 4). All coronavirus $1-CTDs likely shared one 
evolutionary origin, as evidenced by their related structural topol- 
ogies across different genera (Fig. 2E and F). Through divergent 
evolution, coronavirus $1-CTDs attained B-sandwich core struc- 
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FIG 5 Crystal structures of coronavirus S1-NTDs. (A and B) These structures 
include B-genus MHV S1-NTD complexed with its receptor CEACAM1 (PDB 
ID: 3R4D) (81) (A) and B-genus BCoV S1-NTD by itself (PDB ID: 4H14) (53) 
(B). (C) The structure of human galectin-3 (PDB ID: 1A3K) is shown as a 
comparison. In the structure of the MHV S1-NTD-CEACAM1 complex, the 
VBM on CEACAM1 and the RBM on MHV S1-NTD are in blue and red, 
respectively. In the structure of BCoV S1-NTD, the sugar-binding pocket as 
identified by mutagenesis studies is indicated by a five-pointed star. (D and E) 
The structural topologies of the two coronavirus $1-NTDs (D) and human 
galectin-3 (E) are shown as schematic illustrations, where B-strands are de- 
picted as arrows and a-helices as cylinders. In the structural topologies of these 
proteins, the secondary structures are colored and numbered in the same way 
as for the MHV S1-NTD. 


tures in the a-genus and B-sheet core structures in the B-genus. 
Although the structures of y-coronavirus S1-CTDs are not 
known, their core structures may also have a topology related to 
those ofa- and B-coronavirus S1-CTDs. Furthermore, a-corona- 
virus S1-CTDs diverged in the three RBM loops to acquire differ- 
ent receptor specificities—ACE2 specificity for HCoV-NL63 and 
APN specificity for PRCV. B-Coronavirus $1-CTDs also diverged 
in the RBM subdomain to acquire different receptor specifici- 
ties—ACE2 specificity for SARS-CoV and DPP4 specificity for 
MERS-CoV. The S1-CTDs of a-genus HCoV-NL63 and B-genus 
SARS-CoV first diverged into different tertiary structures but later 
converged to recognize the same receptor ACE2. In sum, corona- 
virus S1-CTDs have undergone convoluted structural evolutions, 
leading to their complex receptor recognition pattern. 


S1-NTDs OF B-GENUS CORONAVIRUSES 


B-Genus MHV S1-NTD complexed with mouse CEACAM1 was 
the first structure available for a coronavirus S1-NTD and S1- 
NTD/receptor complex (Fig. 5A) (81). Surprisingly, MHV S1- 
NTD contains a core structure that has the same structural fold as 
human galectins (galactose-binding lectins) (Fig. 5C) (99). The 
core structure of MHV S1-NTD is a thirteen-stranded B-sand- 
wich consisting of two B-sheet layers of six and seven strands, 
respectively. The structural topologies of MHV S1-NTD and hu- 
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man galectins are identical, except that MHV S1-NTD contains 
two additional B-strands in one of the B-sheet layers (Fig. 5D and 
E). Compared with human galectins, MHV S1-NTD contains ad- 
ditional structural motifs on top of the core that form a ceiling-like 
structure. The outer surface of this ceiling-like structure functions 
as RBM by binding to the VBM on the N-terminal Ig-like domain 
of CEACAM1. Despite its galectin fold, MHV S1-NTD does not 
bind sugars, as revealed by sugar-binding assays. Moreover, nei- 
ther the RBM on MHV S1-NTD nor the VBM on CEACAM1 
contains any sugar at the binding interface. Instead, MHV S1- 
NTD binds to CEACAM1 through exclusive protein-protein in- 
teractions. A hydrophobic patch in the VBM of CEACAM1 func- 
tions as a virus-binding hot spot; mutations in this region 
significantly decreased the binding of MHV S1-NTD (81, 100- 
102). Taken together, these results suggest that MHV S1-NTD and 
host galectins share the same evolutionary origin; they also indi- 
cate that although MHV S1-NTD binds only a CEACAM 1 protein 
receptor, other coronavirus S1-NTDs may bind sugar receptors 
and function as viral lectins. 

Analysis of the crystal structure of B-genus BCoV S1-NTD 
provided the first view of a functional lectin domain in a corona- 
virus spike (Fig. 5B) (53). The overall structure of BCoV S1-NTD 
is highly similar to that of MHV S1-NTD, also containing a galec- 
tin-like core and a ceiling-like structure on top of the core. In 
contrast to MHV S1-NTD, which binds CEACAM1 but not sug- 
ars, BCoV S1-NTD binds a sugar receptor but not CEACAM1. 
Glycan screen arrays identified Neu5,9Ac2 (5-N-acetyl-9-O- 
acetylneuraminic acid) as the sugar receptor for BCoV S1-NTD. 
Although the structure of a sugar-bound BCoV S1-NTD is not 
available, structure-guided mutagenesis has revealed that the sug- 
ar-binding site is located in a pocket surrounded by the core and 
the ceiling-like structure on top of the core. The sugar-binding 
sites in BCoV S1-NTD and human galectins overlap, although 
human galectins recognize a different sugar receptor, galactose. 
Structural comparison between MHV and BCoV S1-NTDs re- 
vealed that subtle structural changes between the two S1-NTDs, 
mainly involving different conformations of RBM loops, explain 
why BCoV S1-NTD does not bind CEACAM1 and why MHV 
S1-NTD does not bind sugars. These results suggest that MHV 
and BCoV S1-NTDs are both evolutionarily related to human 
galectins but that they have diverged from human galectins with 
specificities for a novel protein receptor and a different sugar re- 
ceptor, respectively. 

We propose the following evolutionary scenario for coronavi- 
rus S1-NTDs (Fig. 6). Ancestral coronaviruses stole a host galectin 
gene and inserted it into the 5’ end of their spike gene, which 
became coronavirus S1-NTD. Since then, coronavirus S1-NTDs 
have undergone divergent evolution in three genera. B-Genus 
BCoV S1-NTD has kept the lectin activity but evolved specificity 
for a different sugar receptor, Neu5,9Ac2. Although the crystal 
structures of a- and y-coronavirus S1-NTDs are not available, 
they may also have the galectin fold for the following reasons. 
First, the conserved structural topology of S1-CTDs across differ- 
ent coronavirus genera strongly suggests a similarly conserved 
structural topology of S1-NTDs across different coronavirus gen- 
era. Second, the S1-NTDs of both a-genus TGEV and y-genus 
IBV function as lectins, although the former recognizes both N- 
glycolylneuraminic acid (Neu5Gc) and N-acetylneuraminic acid 
(Neu5Ac) and the latter recognizes Neu5Gc. Hence, sugar-bind- 
ing S1-NTDs across different coronavirus genera may share the 


February 2015 Volume 89 Number 4 


Journal of Virology 


Minireview 


ugar Host galectin 


ij (binds sugar, 
se not CEACAM1) 


Sugar NTD of ancestral CoV 


(binds sugar, 
Evolutionary CEACAM1 





not CEACAM1) 
intermediate 
(binds both CEACAM1 
Sugar and sugar) 
CEACAM1 
BCoV NTD MHV NTD 
(binds sugar, (binds CEACAM1, 
not CEACAM1) not sugar) 





FIG 6 Proposed origin and evolution of coronavirus $1-NTDs. Orange ar- 
rows indicate the locations of CEACAM1 or sugar that binds coronavirus 
NTDs. Question marks indicate the postulated structures of hypothetical evo- 
lutionary intermediates. 


same galectin fold but have diverged to recognize different sugar 
receptors. On the other hand, B-genus MHV S1-NTD has evolved 
specificity for a novel protein receptor, CEACAM1. Subsequently, 
MHYV S1-NTD lost its lectin activity because proteins in general 
have advantages over sugars as viral receptors by providing higher 
affinity and specificity for viral attachment. 

Are coronaviruses the only viruses that stole a host lectin and 
integrated it into their spike? A survey of viral lectins with known 
tertiary structures revealed that galectin-like domains are present 
in a variety of viral spikes, including influenza virus hemaggluti- 
nin, whose galectin-like fold was previously unknown (24, 103). 
Moreover, these viral lectins display diverse sugar-binding modes, 
but they share a feature—their sugar-binding sites are all located 
in cavities and are not easily accessible to host antibodies and 
immune cells. As a comparison, the sugar-binding sites in host 
galectins are open and easily accessible (Fig. 5C). It was thus hy- 
pothesized that these viral lectins all originated from host galectins 
but have evolved to use hidden sugar-binding sites to evade host 
immune surveillance (104). The above analysis may explain why 
coronavirus S1-NTDs have evolved the ceiling-like structure on 
top of the core, which is used to protect the sugar-binding site in 
coronavirus S1-NTDs from the host immune system. Subse- 
quently, MHV S1-NTD took advantage of the ceiling-like struc- 
ture and evolved CEACAM 1-binding RBM on the outer surface of 
this ceiling-like structure. In this sense, the evolution of 
CEACAM1-binding RBM in MHV S1-NTD might be an indirect 
outcome of the efforts of coronaviruses to battle the host immune 
attacks. 


RECEPTOR BINDING BY CORONAVIRUSES 


So far, we have reviewed the receptor recognition and evolution of 
coronavirus $1-NTDs and S1-CTDs separately. How do S1-NTDs 
and S1-CTDs work together in the receptor recognition and evo- 
lution of coronavirus spikes? Electron microscopic studies of the 
SARS-CoV spike revealed that it is a clove-shaped trimer, with 
three individual S1 heads and a trimeric S2 stalk (Fig. 7) (27, 28). 
ACE2 binds to the tip of the SARS-CoV spike trimer, where S1- 
CTD is located. Because the membrane-distal tips of the trimeric 
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FIG 7 Summary of the receptor recognition mechanisms of coronaviruses in 
a three-dimensional view. The overall structure of trimeric SARS-CoV spike 
complexed with ACE2 is shown; it includes both the schematic topology of the 
spike and the negative-stain electron micropic images of the spike ectodomain 
(upper right). TM, transmembrane anchor. IC, intracellular tail. The struc- 
tures and functions of coronavirus $1 domains are listed. The question marks 
indicate possible tertiary structures of coronavirus S1 domains. 


spike are the most exposed and protruding region on the whole 
spike, S1-CTD is directly exposed to the host immune system, 
evolves at an increased pace to evade the host immune surveil- 
lance, and becomes hypervariable in primary, secondary, and ter- 
tiary structures. The RBM of S1-CTD is located on the very tip of 
the trimeric spikes and evolves at the fastest pace. On the other 
hand, S1-NTD is likely located underneath S1-CTD, is less ex- 
posed to the host immune system, and evolves at a slower pace 
than S1-CTD. Therefore, between the two S1 domains, the more 
conserved S1-NTDs may function as the more reliable RBDs that 
recognize sugar receptors, allowing coronaviruses to search for 
additional and high-affinity protein receptors using their fast- 
evolving S1-CTDs. Such dual-RBD structures in coronavirus 
spikes may give coronaviruses an evolutionary advantage in find- 
ing new receptors and expanding their host ranges. 

Why were specific host cell surface molecules selected as coro- 
navirus receptors? Among the known coronavirus receptors, sug- 
ars are probably the primordial and fallback receptors for corona- 
viruses. Sugars are abundant on host cell surfaces and are easy 
targets for viruses to grab. To use sugars as their receptors, a vari- 
ety of viruses might have stolen a host galectin and used it as a viral 
lectin. On the other hand, using protein receptors may enhance 
the affinity and specificity of viral attachment, increase the effi- 
ciency of viral entry, and facilitate viruses to expand their host 
ranges and alter their tropisms (105). Host cell surface proteins 
have some common features as viral receptors. First, they fre- 
quently undergo endocytosis, which facilitates viral entry. Second, 
they contain VBM on their surfaces for high-affinity virus bind- 
ing. In the VBMs of both ACE2 and CEACAM 1, virus-binding hot 
spots have been identified and contribute significant energy to 
virus/receptor binding interactions (79, 81, 90). Therefore, host 
cell surface molecules are not randomly selected by viruses as their 
receptors. In fact, there are structural and evolutional reasons be- 
hind these selections by viruses. 
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CONCLUDING REMARKS 


The structural studies of coronavirus-receptor interactions de- 
scribed above have established the following virology principles. 
First, drastic structural changes in viral RBDs can still lead to rec- 
ognition ofa virus-binding hot spot on the same receptor protein. 
Supporting this principle is the finding that SARS-CoV and 
HCoV-NL63 recognize a common virus-binding hot spot on 
ACE2 using structurally divergent $1-CTDs. Second, subtle struc- 
tural changes in viral RBDs can lead to a complete receptor switch. 
For example, HCoV-NL63 and PRCV recognize two different 
protein receptors using structurally conserved $1-CTDs with di- 
vergent RBMs, and so do SARS-CoV and MERS-CoV. Moreover, 
MHV and BCoV S1-NTDs recognize a protein receptor and a 
sugar receptor, respectively, through subtle conformational 
changes in receptor-binding loops. Third, it is a successful viral 
strategy to steal a host protein and evolve it into viral RBDs with 
novel protein receptor specificities or altered sugar receptor spec- 
ificities. For example, MHV and BCoV S1-NTDs have the same 
structural fold as human galectins, but they recognize a novel pro- 
tein receptor and a different sugar receptor, respectively. Fourth, a 
few residue changes at the receptor binding interface can lead to 
efficient cross-species infection and human-to-human transmis- 
sion of a virus. For example, SARS-CoV needed only one or two 
mutations in its RBD to transmit from palm civets to humans. 
These virology principles may be extended from the Coronaviridae 
family to other virus families. 

What are the remaining important questions regarding recep- 
tor recognition mechanisms of coronaviruses? First, what are the 
crystal structures of a-coronavirus S1-NTDs, y-coronavirus S1- 
NTDs, and y-coronavirus S1-CTDs? We have hypothesized that 
a-coronavirus and y-coronavirus S1-NTDs have a galectin fold 
and that y-coronavirus S1-CTDs have either a B-sandwich fold or 
a B-sheet fold. These hypotheses need to be tested using experi- 
mentally determined crystal structures of these S1 domains. Sec- 
ond, what are the detailed sugar-binding mechanisms for corona- 
virus S1-NTDs? The crystal structures of coronavirus S1-NTDs 
complexed with sugar receptors will reveal how sugar receptor 
specificities are achieved in these viral lectins across different coro- 
navirus genera. Third, why do coronaviruses rely on peptidases as 
their receptors? Three of the four known protein receptors for 
coronaviruses are peptidases: ACE2, APN, and DPP4. They are all 
recognized by S1-CTDs of different coronaviruses. It is highly 
unlikely that the use of peptidases as coronavirus receptors is sim- 
ply a coincidence. On the other hand, these receptors’ peptidase 
activities have no effects on coronavirus entry, indicating that 
their common physiological function in degrading peptides was 
not the reason why they were selected as coronavirus receptors. To 
fully understand why peptidases became chosen receptors for 
coronaviruses, it will be important in the future to comprehen- 
sively examine the physiological functions of these peptidase re- 
ceptors. Last, what was the evolutionary origin of coronavirus 
S1-CTDs? So far, coronavirus $1-CTDs appear to have a novel 
fold not related to any other proteins in the protein structure 
database. However, our previous structural studies of coronavirus 
spikes repeatedly showed that tertiary structures of viral proteins 
can deceive the currently available tertiary structural analysis soft- 
ware (98). Instead, our structural topology analysis is a powerful 
tool to identify structural homology among viral proteins (29, 
103). This approach may help identify the evolutionary origin of 


February 2015 Volume 89 Number 4 


WMO 8 Aeq ASON Aq SLOZ ‘2 Areniqe uo /Bio'wse"!Al//:dyy woy papeojumog 


coronavirus S1-CTDs. To sum up, structural studies in the past 
decade have elucidated many puzzles surrounding receptor rec- 
ognition, evolution, and cross-species transmission of coronavi- 
ruses. Future structural studies will continue to solve the remain- 
ing puzzles as well as new puzzles that may emerge regarding the 
receptor recognition mechanisms of coronaviruses. 
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